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(54) A robust pitch estimation method and device using the method for telephone speech 

(57) The present invention provides a pitch estimat- 
ing method and device for accurately estimating the 
pitch of digitized speech signals, in spite of the presence 
of contaminants and distortions in telephone speech 
signals by (1) determining a set of pitch candidates to 
estimate a pitch of the digitized speech signal at each 
of a plurality of time instants, wherein series of these 
time instants define segments of the digitized speech 
signal; (2) constructing a pitch contour using a pitch can- 
didate selected from each of the sets of pitch candidates 
determined in the first step; and (3) selecting a repre- 
sentative pitch estimate for the digitized speech signal 
segment from the set of pitch candidates comprising the 
pitch contour 
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Description 

BACKGROUND OF THE INVENTION 

Pitch estimation devices have a broad range of ap- 
plications in the field of digital speech processing, In- 
cluding use in digital coders and decoders, voice re- 
sponse systems, speaker and speech recognition sys- 
tems, and speech signal enhancement systems. A pri- 
mary practical use of these applications is in the field of 
telecommunications, and the present invention relates 
to pitch estimation of telephonic speech. 

The increasing applications for speech processing 
have led to a growing need for high-quality efficient dig- 
itization of speech signals. Because digitized speech 
sounds can consume large amounts of signal band- 
widths, many techniques have been developed in recent 
years for reducing the amount of information needed to 
transmit or store the signal in such a way that it can later 
be accurately reconstructed. These techniques have fo- 
cused on creating a coding system to permit the signal 
to be transmitted or stored in code, which can be decod- 
ed for later retrieval or reconstruction. 

One modern technique is known as Code Excited 
Linear Predictive coding ("CELP"), which utilizes an "ex- 
citation codebook" of "codevectors," usually in the form 
of a table of equal length, linearly independent vectors 
to represent the excitation signal. Recently developed 
CELP systems typically codify a signal, frame by frame, 
as a series of indices of the codebook (representing a 
series of codevectors), selected by filtering the codevec- 
tors to model the frequency shaping effects of the vocal 
tract, comparing the filtered codevectors with the digi- 
tized samples of the signal, and choosing the codevec- 
tor closest to it. 

Pitch estimation is a critical factor in accurately 
modeling and coding an input speech signal. Prior art 
' pitch estimation devices have attempted to optimize the 
pitch estimate by known methods such as covariance 
or autocorrelation of the speech signal after it has been 
filtered to remove the frequency shaping effects of the 
vocal tract. However, the reliability of these existing de- 
vices are limited by an additional difficulty in accurately 
digitizing telephone speech signals, which are often 
contaminated by non-stationary spurious background 
noise and nonlinearities due to echo suppressors, 
acoustic transducers and other network elements. 

Accordingly, there is a need for a method and device 
that accurately estimates the pitch of speech signals, in 
spite of the presence of non-stationary contaminants 
and distortion. 

SUMMARY OF THE INVENTION 

The present invention provides a pitch estimating 
method and device for estimating the pitch of speech 
signals, in spite of the presence of contaminants and 
distortions in telephone speech signals. More particu- 



larly the present invention provides a pitch estimating 
method and device capable of providing an accurate 
pitch estimate, in spite of the presence of non-stationary 
spurious contamination, having potential use in any 
5 speech processing application. 

Specifically the present invention provides a meth- 
od of estimating the pitch in a digitized speech signal 
comprising the steps of: (1) determining a set of pitch 
candidates to estimate a pitch of the digitized speech 
^0 signal at each of a plurality of time instants, wherein se- 
ries of these time instants define segments of the digi- 
tized speech signal; (2) constructing a pitch contour a 
pitch candidate selected from each of the sets of pitch 
candidates: and (3) selecting a representative pitch es- 

75 timate for each digitized speech signal segment from the 
selected pitch candidates comprising the pitch contour 
Additionally the present invention provides a pitch 
estimator for speech signals comprising a clock for 
measuring a series of time instants; a sampler coupled 

20 to the clock for receiving the speech signals and gener- 
ating a series of digitized speech segments correspond- 
ing to the series of time instants received from the clock; 
a register for producing a plurality of different pitch can- 
didates; a pitch candidate determinator coupled to the 

25 register for receiving the series of digitized speech seg- 
ments and selecting a plurality of pitch candidates from 
the register to approximate pitch values for the digitized 
speech segments; a pitch contour estimator coupled to 
the pitch candidate determinator for constructing a pitch 

30 contour from the pitch candidates selected by the pitch 
candidate determinator; and a pitch estimate selector 
coupled to the pitch contour estimator for selecting a 
pitch estimate from the pitch contour representative of 
the digitized speech segments. 

55 The invention itself, together with further objects 
and attendant advantages, will be understood by refer- 
ence to the following detailed description, taken in con- 
junction with the accompanying drawings. 

40 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram illustrating application 
of the present invention in a low-rate multi-mode CELP 
encoder 

45 Figure 2 is a block diagram illustrating the preferred 
method of pitch estimation in accordance with the 
present invention. 

Figure 3 is a flow chart illustrating the pitch candi- 
date determination stage shown in Figure 2 in greater 

50 detail. 

Figure 4 is a timing diagram illustrating the pitch 
candidate determination stage shown in Figures 2 and 

3. 

Figure 5 is a flow chart illustrating the path metric 
55 computation in accordance with the present invention. 

FigureSisaflowchart illustrating the representative 
pitch candidate selection as provided by the present in- 
vention. 
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DETAILED DESCRIPTION OF THE DRAWINGS 

The present invention is a pitch estimating nnethod 
and device that provides a robust pitch estiniate of an 
input speech signal even in the presence of contami- 
nants and distortion. Pitch estimation is one of the most 
important problems in speech processing because of its 
use in vocoders, voice response systems and speaker 
identification and verification systems, as well as other 
types of speech related systems currently used or being 
developed. 

While the drawings present a conceptualized 
breakdown of the present invention, the preferred em- 
bodiment of the present invention implements these 
steps through program statements rather than physical 
hardware components. Specifically, the preferred em- 
bodiment comprises a digital signal processor Tl 
320C31 , which executes a set of prestored instructions 
on a digitized speech signal, sampled at 8 kHz, and out- 
puts a representative pitch estimate for every 22.5 msec 
segment of the signal. However, because one skilled in 
the art will recognize that the present invention may also 
be readily embodied in hardware, that the preferred em- 
bodiment takes the form of software program state- 
ments should not be construed as limiting the scope of 
the present invention. 

Turning now to the drawings, Figure 1 is provided 
to illustrate a possible application of the present inven- 
tion. Figure 1 shows use of the present invention in a 
low-rate multi-mode CELP encoder. As illustrated, a dig- 
itized, bandpass filtered speech signal 51a sampled at 
8 kHz is input to the Pitch Estimation module 53 of the 
present invention. Also input to the Pitch Estimation 
module 53 are linear prediction coefficients 52a that 
model the frequency shaping effects of the vocal tract. 
These procedures are known in the art. 

The Pitch Estimation module 53 of the present in- 
vention outputs a representative pitch estimate 53a for 
each segment of the input signal, which has two uses 
in the CELP encoder illustrated in Figure 1: First, the 
representative pitch estimate 53a aids the Mode Clas- 
sification module 54 in determining whether the signal 
represented in that speech segment consists of voiced 
speech, unvoiced speech or background noise, as ex- 
plained in the prior art. See, for example, the paper of 
K. Swaminathan et al., "Speech and Channel Codec 
Candidate for the Half Rate Digital Cellular Channel," 
presented at the 1994 ICASP Conference in Adelaide, 
Australia. If the signal is unvoiced speech or background 
noise, the representative pitch estimate 53a has no fur- 
ther use. However, if the signal is classified as voiced 
speech, the representative pitch estimate 53a aids in 
encoding the signal, as indicated by the input to the 
CELP Encoder for Voiced Speech module 55 in Figure 
1, which then outputs the compressed speech 56. 
Those with ordinary skill in the art are aware that numer- 
ous encoding methods have been developed in recent 
years, and the above referenced paper further de- 



scribes aspects of encoders. 

After the speech signal is encoded as compressed 
speech 56, it may be stored or transmitted as required. 
Figure 2 shows a block diagram of the Pitch Esti- 

5 mation module 53 of Figure i. which is the focus of the 
present invention. As shown, after receiving the Speech 
Signal 51 a and Filter Coefficients 52a resulting from the 
linear prediction analysis 52^ the present invention esti- 
mates the signal pitch in three stages: First, the Pitch 

10 Candidate Determination module 10 determines a set 
of pitch candidates P 10a to represent the pitch of the 
speech signal 51a, and calculates cross -correlation val- 
ues 1 0b corresponding to each member of the pitch can- 
didate set P 10a. Second, the Optimal Pitch Contour Es- 

ts ti mat ion module 20 selects optimal pitch candidates 20a 
from among pitch candidate set P 1 0a based in part on 
the cross-correlation values 10b. Finally, in the third 
stage, the Representative Pitch Estimate Selector mod- 
ule 30 selects a representative pitch estimate 53a from 

20 among the optimal pitch candidates 20a to provide an 
overall pitch estimation for the signal segment being an- 
alyzed. 

The three stages of pitch estimation will now be dis- 
cussed in greater detail, with reference to the drawings. 

25 As shown in Figure 3, in the first stage of pitch estinoatk^n 
provided by the present invention, the pitch of the 
Speech Signal S(n) 51a is estimated by analyzing the 
Speech Signal S(n) 51a with a combination of inverse 
filtering and cross-correlation, respectively represented 

30 by the Inverse Filter module 12 and the Cross-Correla- 
tion module 14. 

Speech Signal S(n) 51a is analyzed in segments 
defined by time instants j lla^ which in turn are deter- 
mined by a clock 11. In the preferred embodiment. 

35 Speech Signal S(n) 51a is a digitized speech signal 
sampled at a frequency of 8 kHz (where n represents 
the time of each sample - every .125 msec at a sam- 
pling frequency of 8 kHz). The preferred embodiment of 
the present invention further defines segments at 22.5 

40 msec intervals and time instants at 7.5 msec intervals. 
Figure 4 shows a timing diagram of the preferred em- 
bodiment, further showing the time instants in alignment 
with the boundaries of the speech signal segment. 
Referring now to both Figures 3 and 4, this first 

45 stage of pitch estimation provkJed by the present inven- 
tion determines a set of pitch candidates P 10a at each 
time instant j 1 1 a by evaluating Speech Signal S(n) 51a 
along with the Filter Coefficients a(L) 52a determined by 
linear prediction analysis 52 (as discussed above with 

50 reference to Figure 2). The Inverse Filter module 1 2 per- 
forms this analysis during an inverse filter period (which, 
in the preferred embodiment shown in Figure 4, starts 
7.5 msec into the signal segment and continues 7.5 
msec after the signal segment ends). Residual Signal r 

55 (n) 12a is then output, where: 
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M 

r{n) = i;S(n-L) a(L) 

and M is the linear prediction filter order This process 
is well known to those with ordinary skill in the art. 

Inverse filtered Residual Signal r(n) 12a is then 
cross-correlated within a 15 msec pitch estimation peri- 
od centered around each time instant, as shown in the 
timing diagram of Figure 4. 

Thus, for signal segment A, a set of pitch candidates 
are determined for 5 time instants: the first 7.5 msec pri- 
or to the segment beginning boundary 0^=0), the sec- 
ond at the segment beginning boundary 0^=1 ), the third 
7.5 msec into the segment 0^=2), the fourth 15 msec 
into the segment Qa^S), and the last, at the segment end 
(jA=4). One should note that in evaluating any but the 
first segment of an speech signal, such as signal seg- 
ment B in Figure 4. the set of pitch candidates for jB=0 
and jg=l have already been calculated respectively as 
'\p~3 and \p~4 of the previous segment, thus eliminating 
the need for reevaluation and reducing the real time cost 
of this first stage. 

In the preferred embodiment as illustrated in Figure 
3, a set of possible pitch values for an input speech sig- 
nal is predetermined and stored in a way as to be easily 
accessed, such as in a table 1 3 or a register. The cross- 
correlation for a potential pitch value p 1 3a at a time in- 
stant j 11a is calculated according to the formula: 

<J(P#j) = Er(n) r(n-p) 

n 

where n represents the time of each sample during the 
time span of time instant j and P^j^ < p < P^^^, where 
^min represents the minimum possible pitch value in 
Pitch Value Table 1 3 and P^^ represents the maximum 
possible pitch value in Pitch V^lue Table 13. 

After Cross-Correlation module 14 calculates 
cross-correlation values a(p, j) 1 4a for pitch values p 1 4b 
at a particular time instant j 11a, Peak Selection module 
1 5 determines a set of pitch candidates P 1 0a, each rep- 
resenting a pitch value stored in Pitch V^lue Table 13, 
to estimate the speech signal pitch at that time instant j 
11a. Only those "peak" pitch values with the highest 
cross-correlation values are chosen as pitch candi- 
dates. 

Each member of the set P 10a can be represented 
as P(i,j), where i is the index into set P 10a and j repre- 
sents the time instant. (In the preferred embodiment, 0 
< i < 2, indicating that two pitch values are chosen as 
pitch candidates to represent the signal at each time in- 
stant.) Additionally, for each member P(i,j), the cross- 
correlation value a(P(i,j),j) 1 4a will hereinafter be denot- 
ed simply as p(ij) 10b. 

One skilled in the art will recognize that there are 
numerous methods for storing set P 1 0a, and this inven- 
tion should not be construed to be limited to specific 
methods. For example, the pitch value represented by 
each P(iJ) may be stored in a memory cache or register. 



or may be referenced by the appropriate entry in the 
Pitch Value Table 13. 

Those skilled in the art will also recognize that while 
the pitch candidates at the end of the first stage do ac- 

s count for any stationary background noise that may be 
present in the signal, like prior art pitch estimators, they 
cannot account for non-stationary spurious contamina- 
tion. Thus, the present invention goes beyond known 
pitch estimation by providing a second stage of pitch es- 

fo timation. constructing an optimal pitch contour for the 
speech signal from optimal pitch candidates, which are 
selected from each set of pitch candidates P estimating 
the pitch of the speech signal at time instant j, as deter- 
mined in the first stage. 

^5 In this second stage, before selecting a particular 
pitch candidate as the optimal candidate for a particular 
time instant, the pitch candidates generated for sur- 
rounding time instants are also considered. If a particu- 
lar pitch candidate is inconsistent with the overall con- 

20 tour of the pitch candidates suggested over a period of 
time, the pitch candidate is likely to reflect non-station- 
ary noise-contaminated speech rather than the speech 
signal, and is therefore not be chosen as the optimal 
candidate. 

25 p(i.j) designates the ith pitch candidate found for 
time instant j, where Np pitch candidates were found for 
Mp time instants. The ultimate objective of this second 
stage is to select one of the Np pitch candidates for each 
of the Mp time instants to create an optimal pitch contour 

30 that is the closest fit to the path of the pitch trajectory of 
the speech signaL taking into account pitch estimate er- 
rors caused by spurious contaminants and distortion. 
The pitch candidate selected is designated as the "op- 
timal" pitch candidate. 

3S First, branch metric analysis is conducted to meas- 
ure the distortion of the transition from each pitch can- 
didate P(i,j-1 ) at time instant j-1 to each pitch candidate 
P(k,j) at time instant j. In the preferred embodiment of 
this invention, this calculation is formulated as: 

C(i,k,j)=-p(ij.i).p(kj) 
where 0 < i.k < Np (where i and k are indices into the set 
of pitch candidates), 0 < j < Mp and p represents the 
cross-correlation calculated in the first stage as previ- 
ously explained. This particular formula was chosen for 
the preferred embodiment because it provides good re- 
sults and is easy to implement. One with ordinary skill 
in the art will recognize that the above formula is merely 
exemplary, and its use should not be construed as lim- 
iting the scope of the present invention. 

Using this cost function, the overall path metric is 
determined, which measures the distortion d(k,j) for a 
pitch trajectory over the period from the initial time in- 
stant to time instant j, leading to pitch candidate P(k,j). 
The path metric is initialized for the first time instant (j=0) 

55 by setting: 

d(k.O) = -p(k,0);0<k<Np 
where k is the index into the set of pitch candidates gen- 
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erated for time instant j=0. Optimal path metrics are then 
calculated for d(kj) for all k and all j (where 0 < j < Mp), 
using the formula: 

d{kJ) = mino,j^^p(d{iJ-1)+C(i,k,i)) 

where 0 < k < Np. 0 < j < Mp. 

Once the path metric d(k.j) for each pitch candidate 
k at each time instant j is determined, the optimal map- 
ping is recorded as: 

l(k.j)= i„:„: 0 < k< 0 < j < 

^ mm- p * p 

where i^^j^ is the index for which d(kj) = d(i^jnJ-1) + C 

Figure 5 illustrates path metric analysis, where 
there are two pitch candidates chosen to represent the 
signal pitch at each time instant (Np = 2), and the signal 
is analyzed in segments defined by five time instants 
{f\/lp = 5). The example illustrated shows derivation of 
the path metric to pitch candidate P{0.3) (i.e., the first of 
the two pitch candidates for time instant j=3). 

By the time d(0,3) is being calculated, d(i,2) has al- 
ready been calculated for all i. As indicated in Figure 5, 
do 21a represents [d(0,2) + C(0,0,3)] and d^ 21b repre- 
sents [d(1,2) + C(1.0,3)]. These sums do 21 a and d-, 21b 
are compared and d(0,3) is assigned the value min(do, 
di) 22. 1(0,3) is then set to 0 if do < d^ 23a: or to 1 if do 
> di 23b. 

In this example, after d(0.3) and 1(0,3) are deter- 
mined and recorded, d(1.3) and 1(1,3) are similarly de- 
termined and recorded before going on to determine the 
path metric for the next time instant d(i,4), for all values 
of i. 

Once all the path metrics are calculated for each 
time instant and pitch candidate in the signal segment, 
a traceback procedure is used to obtain optimal pitch 
candidates for each time instant j as follows: 

iopiW = '(*opta+1)J+1) 
where 0 < j+1 < Mp with the boundary condition that iopt 
(Mp-1) is the value for which d(iopi(Mp-1 ), N/ip-1) = 
nriino£k<Np(ci(^^'Mp-1)). 

The pitch candidate Pj = P(iopt(j)'j) ^^^^ 
stants j, where 0 < j+1 < Mp, is selected from each set 
P determined in the first stage of the pitch estimation 
provided by the present invention. The set of all Pj for 0 
< j < Mp defines the optimal pitch contour of the speech 
signal segment being analyzed, and as with the set P, 
numerous methods to store this set of pitch candidates 
P| will be obvious to those skilled in the art. 

A flow chart of the representative pitch estimate se- 
lection, the third and final stage of the pitch estimation 
provided by the present invention, is shown in Figure 6. 
As discussed in greater detail below, if the pitch of the 
speech signal during the segment being analyzed is rel- 
atively stable, a single overall pitch estimate will be de- 
rived by taking an approximate modal average of the 
optimal pitch candidates, taking into account the possi- 
bility that some of these optimal pitch candidates may 
be in slight error or could suffer from pitch doubling or 
pitch halving. If the signal pitch is determined to be in- 



sufficiently stable over the signal segment being ana- 
lyzed, a pitch estimate will not be reliable and no pitch 
estimation will be made by the present invention. 

By this stage, optimal pitch candidates Pj for each 
5 time instant j (0 < j < Mp) has already been selected. The 
third stage of pitch estimation as provided by the present 
invention now computes a distance metric 6j| for each 
pair Pj and P| (where j,l represent time instants), as il- 
lustrated in Figure 6, 32a, 32b, 32c, and 33: 

5.,,=:Pj-2P,: 
6j,2 = l2Pj-P,; 
6j, = min(6j^o, 6j^^, 5j^2) 

The distance metric 6j| 33 is an indication of the var- 
iation in pitch between time instants within the signal 
segment being analyzed, and a lower value reflects less 
variation and suggests that pitch estimation for the over- 
all signal segment may be appropriate. Accordingly, in 
this stage of the present invention, for every pitch esti- 
mate Pj, a counter C(j) is initiated at 0 31 , and is incre- 
mented 35 each time 5j| for 0 < 1 < Mp falls below a pre- 
determined threshold 5^ 34. 

This process is repeated for all values of j and 1, 
where 0 < jj < Mp 36, 37, 40, 41 . As these calculations 
are completed for each j, pitch estimate PE is set to the 
pitch value represented by Pj if the counter C(j)'is the 
highest counter value calculated so far 39. Once all such 
calculations are completed, if C^^ax- highest value of 
C(j) for all j, 3B, 39, exceeds a predetermined minimum 
acceptable value 42, pitch estimate PE is selected 
as the representative pitch estimate for that signal seg- 
ment 42b. If C^g^ does not exceed predetermined min- 
imum acceptable value 0^42, the pitch estimate is dis- 
carded as unreliable 42a. As one skilled in the art will 
recognize, a state of having no reliable pitch estimate 
can be signalled by various methods, such as generat- 
ing a specific error signal or by assigning an impossible 
pitch value (i.e., greater than Pmax '^^^ ^^^^ Pmin)- 
''^ The pitch estimating device and method of the 
present invention provides numerous advantages by 
adding the second and third stages to conventional pitch 
estimation because, as shown above, these additional 
measures permit a more accurate representation of 
speech signals even if non -stationary distortion is 
present, which prior art pitch estimation could not 
achieve. 

Of course, it should be understood that a wide range 
of changes and modifications can be made to the pre- 
f erred embodiment described above. It is therefore in- 
tended that the foregoing detailed description be regard- 
ed as illustrative rather than limiting and that it be un- 
derstood that it is the following claims, including all 
equivalents, which are intended to define the scope of 
this invention. 
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)ims 

A method of estimating the pitch of a digitized 
speech signal (51a) comprising the steps of: 

s 8. 

determining a set of pitch candidates (10a) to 
estimate the pitch of the digitized speech signal 
(51a) at each of a plurality of time instants, 
wherein series of the time instants define seg- 
ments of the digitized speech signal (51 a); io 
constructing a pitch contour for the digitized 
speech signal segments using a selected pitch 
candidate (20a) from each of the sets of pitch 9. 
candidates (10a); 

selecting a representative pitch estimate (53a) is 
for each of the digitized speech signal seg- 
ments from the selected pitch candidates (20a) 
comprising the pitch contour. 

The method pitch estimation according to claim 1 20 
wherein the time instants are defined at 7.5 msec 
intervals. 

The method of pitch estimation according to claims 

1 or 2, wherein the digitized speech signal seg- 25 

ments have a duration of 22.5 msec. 

The method of pitch estimation according to any 
one or more of claims 1 ,2 or 3, wherein the step of 
determining the set of pitch candidates (10a) com- 30 
prises use of linear prediction analysis (52) to deter- 
mine filter coefficients (52a) to approximate the dig- 
itized speech signal (51a). 

The method of pitch estimation according to claim 35 
4, wherein the step of determining the set ot pitch 
candidates includes inverse filtering the digitized 
speech signal (51a) using The filter coefficients 
(52a), and cross-correlating the inverse filtered dig- 
itized speech signal. 40 10. 

The method of pitch estimation according to any 
one or more of claims 1, 2, 3, 4 or 5, wherein the 
step of constructing the pitch contour comprises 
determining the selected pitch candidate from each 4S 
of the pitch candidate sets (10a), the pitch candi- 
date having a minimum path metric distortion value 
(20a). 



date (20a) having a maximum number of distance 
metric values falling below a predetermined thresh- 
old. 

The method of pitch estimation according to claim 
7 further comprising the step of generating an error 
signal (42a) if the maximum number of distance 
metric values falling below said predetermined 
threshold for the selected representative pitch esti- 
mate does not exceed a predetermined minimum 
acceptable value. 

A pitch estimator for speech signals comprising: 

a clock (11) for measuring a series of time 
instants; 

a sampler (50) coupled to the clock (11) tor 
receiving the speech signals and generating a 
series of digitized speech segments (51a) cor- 
responding to the series of time instants 
received from the clock (11 ); 
a register (1 3) for producing a plurality of differ- 
ent pitch candidates (13a); 
a pitch candidate determinator (10) coupled to 
the register (13) for receiving the series of dig- 
itized speech segments (51a) and selecting a 
plurality of pitch candidates (10a) from the reg- 
ister (13) to approximate pitch values for the 
digitized speech segments; 
a pitch contour estimator (20) coupled to the 
pitch cand idate determinator ( 1 0) for construct- 
ing a pitch contour (20a) from the pitch candi- 
dates (10a) selected by the pitch candidate 
determinator (10); 

a pitch estimate selector (30) coupled to the 
pitch contour estimator (20) for selecting a pitch 
estimate (53a) from the pitch contour (20a) rep- 
resentative of the digitized speech segments. 

The pitch estimator according to claim 9, wherein 
the pitch contour estimator (20) calculates a path 
metric value measuring distortion for a pitch trajec- 
tory ot the digitized speech segments for the pitch 
candidates (10a) selected by the pitch candidate 
determinator (10), and selects the pitch candidates 
(20a) corresponding to the minimum path metric 
distortion values. 



The method of pitch estimation according to any BO 
one or more of claims 1 , 2, 3. 4, 5 or 6, wherein the 
step of selecting the representative pitch estimate 
(53a) for each of the digitized speech signal seg- 
ments comprises calculating a distance metric 
value for each pair of selected pitch candidates ss 
(20a) comprising the pitch contour of the digitized 
speech segment, and selecting as the representa- 
tive pitch estimate (53a), the selected pitch candi- 



EP 0 712 116 A2 




& c 
3 

t- = O ^ 



TJ C) 

?r »- 3 

= 1 rr a 

O u 
O 





2 


S 




w 




O 




o 






^• 


o 




o 


r 


to 


yr 




o 




(D 


%n 




a 






n 




n 






O 




n 






c 




















a 











C 




w 


n 






o 






n 


< 




n 




A 


o 




o 




o 






a 




sr 


o 




n 






A 




►1 






a 









<n < i-h n 

*o o o 3 

» • »n o 

n o 

o a tx 



1 



7 



NSDOCID: <EP !0712116a1.I_> 



EP0 712 116 A2 



n 
o 



o 
ft 
o 



3 



O 

n 

H 

n 

z 
> 

H 

M 
O 
Z 



o n 



o 



n n 

O •-I 

n o 



33 



vt n *^ 

M rr 3 rt 

O an 

6; »0 ST 



O 



H 

> 

O 
z 



n 
o 
z 

o 
c 



O 

M H 

o 3: 
=; > 
r 



Of e 




"3- 



o 





n 




39 


n 


tn 




cn 


r* 


H 


t— « 


cn 






H 


n 


o 


3: 


O 


z 




> 






o 


-3 




> 


33 


n 















H- 


ft 




ft 


T3 




o 




Ln 


rr 


0 


u 




cn 




n 


n 






3 




ft 


ft 






ft) 




3 


n- 




0 






rr 


< 




O 





>i^)OCID: <EP 07121 16A2J_> 



8 



EP0 712 116 A2 



o 
o 




< "3 

c n 



< 

u* t— rr 
& CO 
ft 3- 



t-i < 

c- r: 

M ?S 

n in 



1 




O 
9 

> 



o 
cn 




^ < o n 

O » o 

— c 1 «* 

A A W 



t-' rj n 

t-n O > 




▼ t 



o 







r: 








rr 
O 




o 


3* 










I 


n 




o 






c 






"I 


a 












a 












rr 


c 




O 










C 


t/i 






O 






rr 




< 
























o 









9 



^SDOCID: <EP 0/12116A2_l.> 



EP0 712 116 A2 



o 



3 
u 

o 

3 



3 



3 

rr 



3 



ft 
3 



3 
o 



3 

o 
o 



a 

o 



r 



n B 

o w 



V I f 



LJ 

a *- 



il 

o 



D > 

n n 

O Uj 

p > 

a n 



3 

< 
o 



rr 

M 

o 
o 



3 
< 

tn 



o 
c 



o 

3 
O 
3 



l- 



3 



CO 
ft 

a 

B 

5 



1 



O 



O 

o 



-J 

LA 



o 



LA 



^ISDOCID: <EP .07121 16A2Ll_> 



10 



EP0 712 116 A2 



INPUT CONDITIONS: 


Time Instant 


j - 3 


Pitch Candidate index Jc = 0 


N =2 




M =5 




■ "1 






T 

Repeat for: 

Time Instant j = 3 

Pitch Candidate index k « 1 

before calculating path metric for Time Instant j = 4 
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(54) A robust pitch estimation method and device using the method for telephone speech 



(57) The present invention provides a pitch estimat- 
ing method and device for accurately estimating the 
pitch of digitized speech signals, in spite of the presence 
of contaminants and distortions in telephone speech 
signals by (1) determining a set of pitch candidates to 
estimate a pitch of the digitized speech signal at each 
of a plurality of time instants, wherein series of these 



time instants define segments of the digitized speech 
signal: (2) constructing a pitch contour using a pitch can- 
didate selected from each of the sets of pitch candidates 
determined in the first step; and (3) selecting a repre- 
sentative pitch estimate for the digitized speech signal 
segment from the set of pitch candidates comprising the 
pitch contour. 
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